浏览量 3063
2021/06/25 17:13
Autoscaling GitLab Runner on AWS EC2
在AWS EC2上自动缩放GitLab Runner
One of the biggest advantages of GitLab Runner is its ability to automatically spin up and down VMs to make sure your builds get processed immediately. It’s a great feature, and if used correctly, it can be extremely useful in situations where you don’t use your runners 24/7 and want to have a cost-effective and scalable solution.
GitLab Runner最大的优势之一是它能够自动启动和关闭虚拟机,以确保你的构建能立即得到处理。这是一个很好的功能,如果使用得当,在你不是24小时都在使用你的运行器,而希望有一个低成本、可扩展的解决方案的情况下,它是非常有用的。
Introduction
简介
In this tutorial, we’ll explore how to properly configure GitLab Runner in AWS. The instance in AWS will serve as a Runner Manager that spawns new Docker instances on demand. The runners on these instances are automatically created. They use the parameters covered in this guide and do not require manual configuration after creation.
In addition, we’ll make use of Amazon’s EC2 Spot instances which will greatly reduce the costs of the GitLab Runner instances while still using quite powerful autoscaling machines.
在本教程中,我们将探讨如何在AWS中正确配置GitLab Runner。AWS中的实例将作为一个运行器管理器,按需生成新的Docker实例。这些实例上的运行器是自动创建的。它们使用本指南中涉及的参数,不需要在创建后手动配置。
此外,我们将利用亚马逊的EC2 Spot实例,这将大大降低GitLab Runner实例的成本,同时仍然使用相当强大的自动扩展机器。
Prerequisites
先决条件
A familiarity with Amazon Web Services (AWS) is required as this is where most of the configuration will take place.
We suggest a quick read through Docker machine amazonec2 driver documentation to familiarize yourself with the parameters we will set later in this article.
Your GitLab Runner is going to need to talk to your GitLab instance over the network, and that is something you need think about when configuring any AWS security groups or when setting up your DNS configuration.
For example, you can keep the EC2 resources segmented away from public traffic in a different VPC to better strengthen your network security. Your environment is likely different, so consider what works best for your situation.
熟悉亚马逊网络服务(AWS)是必要的,因为大部分的配置将在这里进行。
我们建议快速阅读一下Docker机器的amazonec2驱动文档,以熟悉我们在本文后面要设置的参数。
你的GitLab Runner需要通过网络与你的GitLab实例对话,这是你在配置任何AWS安全组或设置DNS配置时需要考虑的问题。
例如,你可以将EC2资源与不同VPC中的公共流量分开,以更好地加强网络安全。你的环境可能是不同的,所以要考虑什么最适合你的情况。
AWS security groups
AWS 安全组
Docker Machine will attempt to use a default security group with rules for port 2376 and SSH 22, which is required for communication with the Docker daemon. Instead of relying on Docker, you can create a security group with the rules you need and provide that in the GitLab Runner options as we will see below. This way, you can customize it to your liking ahead of time based on your networking environment. You have to make sure that ports 2376 and 22 are accessible by the Runner Manager instance.
Docker Machine会尝试使用一个默认的安全组,其中包括2376端口和SSH 22的规则,这是与Docker守护进程通信所必需的。与其依赖Docker,你可以创建一个具有你需要的规则的安全组,并在GitLab Runner选项中提供,我们将在下面看到。这样,你就可以根据你的网络环境提前定制你的喜好。你必须确保端口2376和22可以被运行器管理器实例访问。
AWS credentials
AWS 证书
You’ll need an AWS Access Key tied to a user with permission to scale (EC2) and update the cache (via S3). Create a new user with policies for EC2 (AmazonEC2FullAccess) and S3 (AmazonS3FullAccess). To be more secure, you can disable console login for that user. Keep the tab open or copy paste the security credentials in an editor as we’ll use them later during the GitLab Runner configuration.
你需要一个AWS的访问密钥,与一个有权限进行扩展(EC2)和更新缓存(通过S3)的用户绑定。创建一个新的用户,为EC2(AmazonEC2FullAccess)和S3(AmazonS3FullAccess)制定策略。为了更安全,你可以禁用该用户的控制台登录。保持标签打开或在编辑器中复制粘贴安全证书,因为我们稍后在配置GitLab Runner时将会用到它们。
Prepare the Runner Manager instance
准备好Runner Manager实例
The first step is to install GitLab Runner in an EC2 instance that will serve as the Runner Manager that spawns new machines. Choose a distribution that both Docker and GitLab Runner support, like Ubuntu, Debian, CentOS, or RHEL.
This doesn’t have to be a powerful machine since it will not run any jobs itself, so for your initial configuration you can start with a smaller instance such as a t4g.nano. This machine will be a dedicated host since we need it always up and running, thus it will be the only standard cost.
Install the prerequisites:
- Log in to your server
- Install GitLab Runner from the official GitLab repository
- Install Docker
- Install Docker Machine Now that the Runner is installed, it’s time to register it.
第一步是在一个EC2实例中安装GitLab Runner,该实例将作为产生新机器的Runner管理器。选择一个Docker和GitLab Runner都支持的版本,如Ubuntu、Debian、CentOS或RHEL。
这台机器不一定要很强大,因为它本身不会运行任何工作,所以在初始配置时可以从一个较小的实例开始,比如t4g.nano。这台机器将是一个专用的主机,因为我们需要它一直运行,因此它将是唯一的标准成本。
安装先决条件。
- 登录到你的服务器
- 从GitLab的官方仓库中安装GitLab Runner
- 安装Docker
- 安装Docker Machine 现在运行器已经安装完毕,是时候注册了。
Registering the GitLab Runner
注册 Gitlab Runner
Before configuring the GitLab Runner, you need to first register it, so that it connects with your GitLab instance:
- Obtain a runner token
- Register the runner
- When asked the executor type, enter docker+machine You can now move on to the most important part, configuring the GitLab Runner.
If you want every user in your instance to be able to use the autoscaled runners, register the runner as a shared one.
在配置GitLab Runner之前,你需要首先注册它,以便它能与你的GitLab实例连接。
- 获取一个运行器令牌
- 注册运行器
- 当询问执行器类型时,输入docker+machine 现在可以进入最重要的部分了,配置GitLab运行器。
如果你希望你的实例中的每个用户都能使用自动缩放的运行器,请将运行器注册为共享的。
Configuring the runner
配置 Runner
Now that the runner is registered, you need to edit its configuration file and add the required options for the AWS machine driver.
Let’s first break it down to pieces.
现在已经注册了运行程序,您需要编辑它的配置文件,并为AWS机器驱动程序添加所需的选项。
让我们先把它分成几部分。
The global section
全局部分
In the global section, you can define the limit of the jobs that can be run concurrently across all runners (concurrent). This heavily depends on your needs, like how many users GitLab Runner will accommodate, how much time your builds take, etc. You can start with something low like 10, and increase or decrease its value going forward.
The check_interval option defines how often the runner should check GitLab for new jobs, in seconds.
在全局部分,你可以定义所有运行器中可以并发运行的作业限制(concurrent)。这在很大程度上取决于你的需求,比如GitLab Runner能容纳多少用户,你的构建需要多少时间,等等。你可以从10这样的低值开始,然后继续增加或减少其数值。
check_interval选项定义了运行器检查GitLab新作业的频率,单位是秒。
Example:
concurrent = 10
check_interval = 0
Other options are also available.
The runners section
Runners部分
From the [[runners]] section, the most important part is the executor which must be set to docker+machine. Most of those settings are taken care of when you register the runner for the first time.
limit sets the maximum number of machines (running and idle) that this runner will spawn. For more information, check the relationship between limit, concurrent and IdleCount.
从[[runners]]部分来看,最重要的部分是执行器,必须设置为docker+machine。大部分的设置在你第一次注册运行器的时候就已经处理好了。
limit设置了这个运行器将产生的最大机器数量(运行和空闲)。欲了解更多信息,请查看limit、concurrent和IdleCount之间的关系。
Example:
[[runners]]
name = "gitlab-aws-autoscaler"
url = "<URL of your GitLab instance>"
token = "<Runner's token>"
executor = "docker+machine"
limit = 20
Other options under [[runners]] are also available.
也可以使用[[runners]]下的其他选项。
The runners.docker section
runners.docker部分
In the [runners.docker] section you can define the default Docker image to be used by the child runners if it’s not defined in .gitlab-ci.yml. By using privileged = true, all runners will be able to run Docker in Docker which is useful if you plan to build your own Docker images via GitLab CI/CD.
Next, we use disable_cache = true to disable the Docker executor’s inner cache mechanism since we will use the distributed cache mode as described in the following section.
在[runners.docker]部分,你可以定义默认的Docker镜像,如果它没有在.gitlab-ci.yml中定义的话,将由子运行器使用。通过使用priorleged = true,所有运行者将能够在Docker中运行Docker,如果你打算通过GitLab CI/CD构建你自己的Docker镜像,这很有用。
接下来,我们使用disable_cache = true来禁用Docker执行器的内部缓存机制,因为我们将使用分布式缓存模式,如下节所述。
Example:
[runners.docker]
image = "alpine"
privileged = true
disable_cache = true
Other options under [runners.docker] are also available.
The runners.cache section
runners.cache部分
To speed up your jobs, GitLab Runner provides a cache mechanism where selected directories and/or files are saved and shared between subsequent jobs. While not required for this setup, it is recommended to use the distributed cache mechanism that GitLab Runner provides. Since new instances will be created on demand, it is essential to have a common place where the cache is stored.
In the following example, we use Amazon S3:
为了加快工作进度,GitLab Runner 提供了一种缓存机制,选定的目录和/或文件会被保存下来并在后续工作中共享。虽然这个设置不是必须的,但我们建议使用GitLab Runner提供的分布式缓存机制。由于新的实例将按需创建,因此必须有一个共同的地方来存储缓存。
在下面的例子中,我们使用了Amazon S3。
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "<your AWS Access Key ID>"
SecretKey = "<your AWS Secret Access Key>"
BucketName = "<the bucket where your cache should be kept>"
BucketLocation = "us-east-1"
The runners.machine section
runners.machine 部分
This is the most important part of the configuration and it’s the one that tells GitLab Runner how and when to spawn new or remove old Docker Machine instances.
We will focus on the AWS machine options, for the rest of the settings read about the:
Autoscaling algorithm and the parameters it’s based on - depends on the needs of your organization Autoscaling periods - useful when there are regular time periods in your organization when no work is done, for example weekends Here’s an example of the runners.machine section:
这是配置中最重要的部分,它告诉GitLab Runner如何以及何时生成新的或删除旧的Docker Machine实例。
我们将重点讨论AWS机器的选项,其余的设置请阅读有关内容。
自动缩放算法和它所依据的参数 - 取决于你的组织的需要 自动缩放时间段 - 当你的组织有固定的时间段没有工作时很有用,比如说周末 下面是runners.machine部分的一个例子。
[runners.machine]
IdleCount = 1
IdleTime = 1800
MaxBuilds = 10
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
MachineOptions = [
"amazonec2-access-key=XXXX",
"amazonec2-secret-key=XXXX",
"amazonec2-region=us-central-1",
"amazonec2-vpc-id=vpc-xxxxx",
"amazonec2-subnet-id=subnet-xxxxx",
"amazonec2-zone=x",
"amazonec2-use-private-address=true",
"amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
"amazonec2-security-group=xxxxx",
"amazonec2-instance-type=m4.2xlarge",
]
[[runners.machine.autoscaling]]
Periods = ["* * 9-17 * * mon-fri *"]
IdleCount = 50
IdleTime = 3600
Timezone = "UTC"
[[runners.machine.autoscaling]]
Periods = ["* * * * * sat,sun *"]
IdleCount = 5
IdleTime = 60
Timezone = "UTC"
The Docker Machine driver is set to amazonec2 and the machine name has a standard prefix followed by %s (required) that is replaced by the ID of the child runner: gitlab-docker-machine-%s.
Now, depending on your AWS infrastructure, there are many options you can set up under MachineOptions. Below you can see the most common ones.
Docker Machine驱动被设置为amazonec2,机器名称有一个标准前缀,后面是%s(必填),被子运行器的ID取代:gitlab-docker-machine-%s。
现在,根据你的AWS基础设施,你可以在MachineOptions下设置许多选项。下面你可以看到最常见的那些。
Machine option | Description |
---|---|
amazonec2-access-key=XXXX | The AWS access key of the user that has permissions to create EC2 instances, see AWS credentials. |
amazonec2-secret-key=XXXX | The AWS secret key of the user that has permissions to create EC2 instances, see AWS credentials. |
amazonec2-region=eu-central-1 | The region to use when launching the instance. You can omit this entirely and the default us-east-1 will be used. |
amazonec2-vpc-id=vpc-xxxxx | Your VPC ID to launch the instance in. |
amazonec2-subnet-id=subnet-xxxx | The AWS VPC subnet ID. |
amazonec2-zone=x | If not specified, the availability zone is a, it needs to be set to the same availability zone as the specified subnet, for example when the zone is eu-west-1b it has to be amazonec2-zone=b |
amazonec2-use-private-address=true | Use the private IP address of Docker Machines, but still create a public IP address. Useful to keep the traffic internal and avoid extra costs. |
amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true | AWS extra tag key-value pairs, useful to identify the instances on the AWS console. The “Name” tag is set to the machine name by default. We set the “runner-manager-name” to match the runner name set in [[runners]], so that we can filter all the EC2 instances created by a specific manager setup. |
amazonec2-security-group=xxxx | AWS VPC security group name, not the security group ID. See AWS security groups. |
amazonec2-instance-type=m4.2xlarge | The instance type that the child runners will run on. |
机器选项 | 描述 |
---|---|
amazonec2-access-key=XXXX | 拥有创建EC2实例权限的用户的AWS访问密钥,见AWS凭证。 |
amazonec2-secret-key=XXXX | 拥有创建EC2实例权限的用户的AWS密匙,参见AWS证书。 |
amazonec2-region=eu-central-1 | 启动实例时要使用的区域。你可以完全省略这一点,默认的us-east-1将被使用。 |
amazonec2-vpc-id=vpc-xxxxx | 你的VPC ID,用来启动实例。 |
amazonec2-subnet-id=subnet-xxxx | AWS VPC子网ID。 |
amazonec2-zone=x | 如果没有指定,可用性区域是a,它需要被设置为与指定子网相同的可用性区域,例如,当区域是eu-west-1b时,它必须是amazonec2-zone=b |
amazonec2-use-private-address=true | 使用Docker机器的私有IP地址,但仍然创建一个公共IP地址。这对保持内部流量和避免额外费用很有用。 |
amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true | AWS额外的标签键值对,对识别AWS控制台中的实例很有用。默认情况下,"名称 "标签被设置为机器名称。我们将 "runner-manager-name "设置为与[[runners]]中设置的runner名称相匹配,这样我们就可以过滤由特定管理器设置创建的所有EC2实例。 |
amazonec2-security-group=xxxx | AWS VPC安全组名称,而不是安全组ID。参见AWS安全组。 |
amazonec2-instance-type=m4.2xlarge | 子运行器将运行在的实例类型。 |
Notes:
Under MachineOptions you can add anything that the AWS Docker Machine driver supports. You are highly encouraged to read Docker’s docs as your infrastructure setup may warrant different options to be applied. The child instances will use by default Ubuntu 16.04 unless you choose a different AMI ID by setting amazonec2-ami. Set only supported base operating systems for Docker Machine. If you specify amazonec2-private-address-only=true as one of the machine options, your EC2 instance won’t get assigned a public IP. This is ok if your VPC is configured correctly with an Internet Gateway (IGW) and routing is fine, but it’s something to consider if you’ve got a more complex configuration. Read more in Docker docs about VPC connectivity. Other options under [runners.machine] are also available.
注意:
在MachineOptions下,你可以添加AWS Docker Machine驱动支持的任何东西。我们强烈建议你阅读Docker的文档,因为你的基础设施设置可能需要应用不同的选项。 子实例将默认使用Ubuntu 16.04,除非你通过设置amazonec2-ami选择一个不同的AMI ID。只为Docker Machine设置支持的基本操作系统。 如果你指定amazonec2-private-address-only=true作为机器选项之一,你的EC2实例将不会被分配一个公共IP。如果你的VPC配置正确,有互联网网关(IGW),并且路由正常,这就可以了,但如果你有一个更复杂的配置,这就需要考虑。在Docker文档中阅读更多关于VPC连接的内容。 [runners.machine]下的其他选项也是可用的。
Getting it all together
把所有的东西都放在一起
Here’s the full example of /etc/gitlab-runner/config.toml:
concurrent = 10
check_interval = 0
[[runners]]
name = "gitlab-aws-autoscaler"
url = "<URL of your GitLab instance>"
token = "<runner's token>"
executor = "docker+machine"
limit = 20
[runners.docker]
image = "alpine"
privileged = true
disable_cache = true
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "<your AWS Access Key ID>"
SecretKey = "<your AWS Secret Access Key>"
BucketName = "<the bucket where your cache should be kept>"
BucketLocation = "us-east-1"
[runners.machine]
IdleCount = 1
IdleTime = 1800
MaxBuilds = 100
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
MachineOptions = [
"amazonec2-access-key=XXXX",
"amazonec2-secret-key=XXXX",
"amazonec2-region=us-central-1",
"amazonec2-vpc-id=vpc-xxxxx",
"amazonec2-subnet-id=subnet-xxxxx",
"amazonec2-use-private-address=true",
"amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
"amazonec2-security-group=docker-machine-scaler",
"amazonec2-instance-type=m4.2xlarge",
]
[[runners.machine.autoscaling]]
Periods = ["* * 9-17 * * mon-fri *"]
IdleCount = 50
IdleTime = 3600
Timezone = "UTC"
[[runners.machine.autoscaling]]
Periods = ["* * * * * sat,sun *"]
IdleCount = 5
IdleTime = 60
Timezone = "UTC"
Cutting down costs with Amazon EC2 Spot instances
用亚马逊EC2 spot 实例削减成本
As described by Amazon:
Amazon EC2 Spot instances allow you to bid on spare Amazon EC2 computing capacity. Since Spot instances are often available at a discount compared to On-Demand pricing, you can significantly reduce the cost of running your applications, grow your application’s compute capacity and throughput for the same budget, and enable new types of cloud computing applications.
正如亚马逊所描述的那样。
亚马逊EC2 spot实例允许你对亚马逊EC2的空闲计算能力进行投标。由于现货实例与按需定价相比通常有折扣,你可以大大降低运行你的应用程序的成本,在相同的预算下增加你的应用程序的计算能力和吞吐量,并启用新类型的云计算应用。
In addition to the runners.machine options you picked above, in /etc/gitlab-runner/config.toml under the MachineOptions section, add the following:
除了上面挑选的runners.machine选项外,在/etc/gitlab-runner/config.toml的MachineOptions部分,添加以下内容:
MachineOptions = [
"amazonec2-request-spot-instance=true",
"amazonec2-spot-price=",
]
In this configuration with an empty amazonec2-spot-price, AWS sets your bidding price for a Spot instance to the default On-Demand price of that instance class. If you omit the amazonec2-spot-price completely, Docker Machine will set the maximum price to a default value of $0.50 per hour.
You may further customize your Spot instance request:
在这个配置中,如果有一个空的amazonec2-spot-price,AWS会将你对Spot实例的竞价价格设置为该实例类别的默认按需价格。如果你完全省略amazonec2-spot-price,Docker Machine将把最高价格设置为每小时0.50美元的默认值。
你可以进一步定制你的Spot实例请求。
MachineOptions = [
"amazonec2-request-spot-instance=true",
"amazonec2-spot-price=0.03",
"amazonec2-block-duration-minutes=60"
]
With this configuration, Docker Machines are created using Spot instances with a maximum Spot request price of $0.03 per hour and the duration of the Spot instance is capped at 60 minutes. The 0.03 number mentioned above is just an example, so be sure to check on the current pricing based on the region you picked.
To learn more about Amazon EC2 Spot instances, visit the following links:
https://aws.amazon.com/ec2/spot/ https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html https://aws.amazon.com/ec2/spot/getting-started/
通过这种配置,使用Spot实例创建Docker机器,Spot请求的最高价格为每小时0.03美元,Spot实例的持续时间上限为60分钟。上面提到的0.03的数字只是一个例子,所以一定要根据你选择的地区来检查当前的价格。
要了解更多关于亚马逊EC2 Spot实例的信息,请访问以下链接。
https://aws.amazon.com/ec2/spot/
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html
https://aws.amazon.com/ec2/spot/getting-started/
Caveats of Spot instances
现货实例的注意事项
While Spot instances is a great way to use unused resources and minimize the costs of your infrastructure, you must be aware of the implications.
Running CI jobs on Spot instances may increase the failure rates because of the Spot instances pricing model. If the maximum Spot price you specify exceeds the current Spot price you will not get the capacity requested. Spot pricing is revised on an hourly basis. Any existing Spot instances that have a maximum price below the revised Spot instance price will be terminated within two minutes and all jobs on Spot hosts will fail.
As a consequence, the auto-scale Runner would fail to create new machines while it will continue to request new instances. This eventually will make 60 requests and then AWS won’t accept any more. Then once the Spot price is acceptable, you are locked out for a bit because the call amount limit is exceeded.
If you encounter that case, you can use the following command in the Runner Manager machine to see the Docker Machines state:
虽然Spot实例是使用未使用的资源并将基础设施的成本降至最低的好方法,但你必须意识到其影响。
由于Spot实例的定价模式,在Spot实例上运行CI作业可能会增加故障率。如果你指定的最高Spot价格超过了当前的Spot价格,你将无法获得所要求的容量。Spot定价是以小时为单位进行修订的。任何现有的Spot实例,如果其最高价格低于修订后的Spot实例价格,将在两分钟内被终止,Spot主机上的所有作业将失败。
因此,自动规模运行器将无法创建新的机器,而它将继续请求新的实例。这最终会提出60个请求,然后AWS就不会再接受任何请求。那么一旦Spot价格可以接受,你就会被锁定一段时间,因为超过了调用金额限制。
如果你遇到这种情况,你可以在Runner Manager机器中使用以下命令来查看Docker Machines状态:
docker-machine ls -q --filter state=Error --format "{{.NAME}}"
There are some issues regarding making GitLab Runner gracefully handle Spot price changes, and there are reports of docker-machine attempting to continually remove a Docker Machine. GitLab has provided patches for both cases in the upstream project. For more information, see issues #2771 and #2772.
有一些关于让GitLab Runner优雅地处理现货价格变化的问题,也有关于docker-machine试图持续删除Docker Machine的报告。GitLab已经在上游项目中为这两种情况提供了补丁。欲了解更多信息,请参见问题#2771和#2772。
Conclusion
总结
In this guide we learned how to install and configure a GitLab Runner in autoscale mode on AWS.
Using the autoscale feature of GitLab Runner can save you both time and money. Using the Spot instances that AWS provides can save you even more, but you must be aware of the implications. As long as your bid is high enough, there shouldn’t be an issue.
You can read the following use cases from which this tutorial was (heavily) influenced:
HumanGeo switched from Jenkins to GitLab
Substrakt Health - Autoscale GitLab CI/CD runners and save 90% on EC2 costs
在本指南中,我们学习了如何在AWS上以自动缩放模式安装和配置GitLab Runner。
使用GitLab Runner的自动扩展功能可以为你节省时间和金钱。使用AWS提供的Spot实例可以为你节省更多,但你必须意识到其影响。只要你的出价够高,应该不会有问题。
你可以阅读以下的使用案例,本教程就是受这些案例的(严重)影响。
Substrakt Health - Autoscale GitLab CI/CD runners and 节省 90% on EC2 花销
上一篇 搜索 下一篇